Compositions and methods for metagenome biomarker detection

ABSTRACT

The present invention provides compositions and methods for the multiplex detection of biomarkers in an environmental, non-biological or biological sample. Compositions and methods are provided for simultaneously detecting and identifying multiple pathogens, including viruses, bacteria, fungi, protozoa and helminths, present in a sample.

CROSS REFERENCE TO RELATED APPLICATIONS

This application is a continuation of, and claims priority to, U.S.patent application Ser. No. 15/301,539 filed Oct. 3, 2016, now issued asU.S. Pat. No. 10,883,145, which is a 35 U.S.C. § 371 national phaseapplication from, and claims priority to, International Application No.PCT/US2015/025415, filed Apr. 10, 2015, and published under PCT Article21(2) in English, which claims priority and the benefit to U.S.Provisional Application No. 61/978,333, filed Apr. 11, 2014, all ofwhich applications are incorporated herein by reference in theirentireties.

BACKGROUND OF THE INVENTION

The need to identify pathogenic organisms, including viruses, bacteria,fungi, helminths, and protozoa, has grown more acute in recent years.The complexity associated with the identification of pathogens linked todisease is daunting and the ability to detect these agents is of utmostimportance to medical, veterinary and agricultural science. An importantfactor in the clinical management of infectious diseases lies in theestablishment of the identity of the etiologic agent or pathogenresponsible for the infection. In most instances, the identification ofthe infecting microbe is central to making decisions for appropriatetherapy and care. In this regard, the attending physician necessarilyrelies upon the clinical microbiology laboratory to provide theessential information required to initiate a rational regimen oftreatment. Challenges relating to current procedures for detectingpathogens in a clinical setting include the need to perform assaysdirectly from clinical specimens or samples, the time required fordetection, a possible inability to cultivate the infectious agent,difficulties regarding the detection of rare or unknown infectiousagents, and difficulties associated with identifying an infectious agentamong others that present similar symptoms.

In another example, there remain considerable challenges in food safetydue to continual pathogen exposures in the food chain (Scallan et. al.,“Foodborne illness acquired in the United States—major pathogens.” EmergInfect Dis. 2011; 17(1): 7-15). There are 48 million cases of foodborneillness reported each year in the US, resulting in approximately 128,000hospitalizations and 3,000 deaths. The economic and healthcare impact issignificant, estimated at $152 billion. Food safety testing has becomeimportant for identifying and removing sources of food that have beencontaminated by pathogen exposure. The market for food safety testingwas $3.3 billion in 2011 and is projected to continue experiencingcompelling growth through 2017. The large number of pathogens linked todisease in animals and plants is extensive, and current screening assaysare not capable of expeditiously and simultaneously screening for alarge number of pathogens in a single assay.

As de novo cataloging expands the count of species in the humanmicrobiome and characterizes their distributions, metagenomic tools areneeded to efficiently identify an agent strongly associated with adisease. The ability to assess a microbiome will be necessary tounderstand interactions between pathogens, and pathogen interactionswith commensal organisms, host genetics, and environmental factors. In2008, over 2 million cases of cancer worldwide (20% of all tumors) wereassociated with one of ten infectious agents: seven viruses(papillomavirus, hepatitis B or C, Epstein-Barr virus, human herpesvirus8, and T-cell leukemia virus type 1), one bacterium (Helicobacterpylori), and two helminthes (schistosomes and liver flukes) which aremajor contributors to the cancers as etiological agents (de Martel etal. Lancet Oncol 2012; 13(6):607-615). Considering the thousands ofspecies that comprise the normal human microbiome (Relman. Nature 2012;486(7402):194-195), it is likely that microorganism communitiessubstantially influence normal physiology as well as the causes of andresponses to diseases (Laass et al. Autoimmun Rev 2014), includingcancer. These effects are the subject of intense investigation intissues known to have resident microbiomes such as the gastrointestinaltract (Laass et al. Autoimmun Rev 2014; Major and Spiller. Curr OpinEndocrinol Diabetes Obes 2014; 21(1):15-21; Schwarzberg et al. PLoS One2014; 9(1):e86708; Scharschmidt and Fischbach. Drug Discov Today DisMech 2013; 10(3-4)), skin (Scharschmidt and Fischbach. Drug Discov TodayDis Mech 2013; 10(3-4)) and airway (Martinez et al. Ann Am Thorac Soc2013; 10 Suppl:S170-179; Segal et al. Ann Am Thorac Soc 2014;11(1):108-116; Sze et al. Ann Am Thorac Soc 2014; 11 Suppl 1:S77) and inimmune and inflammatory responses (Gjymishka et al. Immunotherapy 2013;5(12):1357-1366; Kamada and Nunez. Gastroenterology 2014; Koboziev etal. Free Radic Biol Med 2013; 68C:122-133; Ooi et al. PLoS One 2014;9(1):e86366). Microbiome profiling is also uncovering less obvious rolesfor microbes and their presence in unexpected locations; examplesrelevant to cancer include modulation of tumor microenvironments (Iidaet al. Science 2013; 342(6161):967-970) and dysbiosis of bacterialpopulations in breast cancer tissues (Xuan et al, PLoS One 2014;9(1):e83744).

Existing strategies for detecting pathogens associated with diseaserequire that samples be obtained from the infected subject, and a numberof techniques utilized to identify the pathogen (see, e.g., (FIGS. 1Aand 1B). These techniques typically include enzyme linkedimmuno-absorbent assays (ELISA), specific antibodies against a specificprotein of the suspected pathogen, culture of the pathogen in vitro inthe laboratory, and PCR amplification strategies. PCR amplificationusing universal 16S ribosomal RNA primers, followed by ampliconsequencing, is the most widely used strategy for microbiome studies andprovides an effective discovery tool (Cox et al. Hum Mol Genet 2013;22(R1):R88-94), but only for bacterial species with amplicons thatsurvive population PCR and not for viruses or eukaryotic microorganisms.16S rRNA sequencing can also be used to screen large sets of samples,but may not discriminate between strains or report the presence ofgenomic variants or pathogenicity factors. Deep sequencing of the totalDNA from a sample can identify bacterial, viral and other microbiomemembers (The Human Microbiome Project Consortium. Nature 2012;486(7402):207-214; Cox et al. Hum Mol Genet 2013; 22(R1):R88-94; Ma etal. J Virol 2014), but with a severe penalty in efficiency. Even at theas-yet unrealized goal of $1000 per genome, total DNA sequencing is anexpensive method for screening hundreds or thousands of test and controlsamples to detect associations of pathogens with disease. Depending onthe specimen sampled, the data may overwhelmingly be from host humansequences, creating an unnecessarily large search space for locatingpathogen signatures and resulting in the majority of sequence readsbeing discarded.

DNA microarrays have been used for metagenomics. The Lawrence BerkeleyLab/Affymetrix PhyloChip is based on ribosomal RNA sequences (Brodie etal. Appl Environ Microbiol 2006; 72(9):6288-6298). An academicallydeveloped Virochip has probes for 1500 viruses (Chen et al. J Vis Exp2011 (50)) and has successfully detected viruses in pathology samples.The Virochip platform is limited to viruses and assays RNA that isreversed transcribed to cDNA for PCR amplification (Chen et al. J VisExp 2011 (50)). The Glomics GeoChip 4.0 focuses on RNA expression bybacteria in the human microbiome (Tu et al. Mot Ecol Resour 2014), andcovers bacteriophage but no other viruses nor any eukaryoticmicroorganisms. PathGen Dx has launched a PathChip Kit that features anAffymetrix microarray for all known viruses and a broad selection ofbacteria (Wong et al. Genome Biol 2007; 8 (5):R93), but no eukaryoticpathogens. These and other array-based tools illustrate the demand formethods to quickly and economically screen sets of samples for broadmicrobial content, including species beyond bacteria (Norman et al.Gastroenterology 2014).

Because current methods for detecting and identifying pathogenic andetiological agents are inadequate, compositions and methods forexpeditiously and simultaneously detecting and identifying multiplepathognes, including all currently known pathogens, are urgentlyrequired. Such compositions and methods are also useful for thediagnosis of pathogen-associated disease, including infectious diseasesand cancer, and for gaining understanding of disease states resutlingfrom co-infection by multiple pathogens. The current invention fulfillsthese needs.

SUMMARY OF THE INVENTION

As described herein, the present invention features compositions andmethods for the detection of one or more biomarkers in a samplecomprising genetic material from multiple sources and/or organisms(e.g., metagenomes, microbiomes). In particular, Applicants havedeveloped methods for generating panels or sets of nucleotides for thedetection of genetic material from multiple pathogenic organisms andagents (e.g. viruses), as well as methods for preparing samples foranalysis comprising total nucleic acid extraction (e.g., DNA and RNA).

In certain embodiments, the invention features nucleotide arrays andmethods for simultaneously detecting and identifying multiple types ofpathogens, including viruses, bacteria, fungi, protozoa and helminths,present in a sample. Further, the arrays and methods of the inventioncan be used to detect heretofore unknown pathogens present in a sample,based on the presence of a region of conserved nucleic acid sequence inthe heretofore unknown pathogen. In one embodiment, the pathogen'snucleic acid is derived from a sample obtained from an individualsuspected to be or known to be infected with a pathogen. Detection of aninfectious agent can be used to guide patient care and treatmentselection (e.g., antimicrobial, antiviral, antifungal, antibacterial, orantiparasitic therapy).

The arrays and methods disclosed herein can be used to expeditiouslyscreen a sample for the presence of both known and as-yet unknownpathogens by comparing the pathogen's nucleic acid sequence in a sampleto characteristic regions of sequence common among a related group ofpathogens.

Compositions and articles defined by the invention were isolated orotherwise manufactured in connection with the examples provided below.Other features and advantages of the invention will be apparent from thedetailed description, and from the claims.

Definitions

Unless defined otherwise, all technical and scientific terms used hereinhave the meaning commonly understood by a person skilled in the art towhich this invention belongs. The following references provide one ofskill with a general definition of many of the terms used in thisinvention: Singleton et al., Dictionary of Microbiology and MolecularBiology (2nd ed. 1994); The Cambridge Dictionary of Science andTechnology (Walker ed., 1988); The Glossary of Genetics, 5th Ed., R.Rieger et al. (eds.), Springer Verlag (1991); and Hale & Marham, TheHarper Collins Dictionary of Biology (1991). As used herein, thefollowing terms have the meanings ascribed to them below, unlessspecified otherwise.

As used herein, the articles “a” and “an” are used to refer to one or tomore than one (i.e., to at least one) of the grammatical object of thearticle. By way of example, “an element” means one element or more thanone element.

As used herein when referring to a measurable value such as an amount, atemporal duration, and the like, the term “about” is meant to encompassvariations of ±20% or within 10%, 9%, 8%, 7%, 6%, 5%, 4%, 3%, 2%, 1%,0.5%, 0.1%, 0.05%, or 0.01% of the specified value, as such variationsare appropriate to perform the disclosed methods. Unless otherwise clearfrom context, all numerical values provided herein are modified by theterm about.

A “biomarker” or “marker” as used herein generally refers to a nucleicacid molecule, clinical indicator, protein, or other analyte that isassociated with a disease. In certain embodiments, a nucleic acidbiomarker is indicative of the presence in a sample of a pathogenicorganism, including but not limited to, viruses, viroids, bacteria,fungi, helminths, and protozoa. In various embodiments, a marker isdifferentially present in a biological sample obtained from a subjecthaving or at risk of developing a disease (e.g., an infectious disease)relative to a reference. A marker is differentially present if the meanor median level of the biomarker present in the sample is statisticallydifferent from the level present in a reference. A reference level maybe, for example, the level present in an environmental sample obtainedfrom a clean or uncontaminated source. A reference level may be, forexample, the level present in a sample obtained from a healthy controlsubject or the level obtained from the subject at an earlier timepoint,i.e., prior to treatment. Common tests for statistical significanceinclude, among others, t-test, ANOVA, Kruskal-Wallis, Wilcoxon,Mann-Whitney and odds ratio. Biomarkers, alone or in combination,provide measures of relative likelihood that a subject belongs to aphenotypic status of interest. The differential presence of a marker ofthe invention in a subject sample can be useful in characterizing thesubject as having or at risk of developing a disease (e.g., aninfectious disease), for determining the prognosis of the subject, forevaluating therapeutic efficacy, or for selecting a treatment regimen.

By “agent” is meant any nucleic acid molecule, small molecule chemicalcompound, antibody, or polypeptide, or fragments thereof.

By “alteration” or “change” is meant an increase or decrease. Analteration may be by as little as 1%, 2%, 3%, 4%, 5%, 10%, 20%, 30%, orby 40%, 50%, 60%, or even by as much as 70%, 75%, 80%, 90%, or 100%.

By “biologic sample” is meant any tissue, cell, fluid, or other materialderived from an organism.

By “capture reagent” is meant a reagent that specifically binds anucleic acid molecule or polypeptide to select or isolate the nucleicacid molecule or polypeptide.

As used herein, the terms “determining”, “assessing”, “assaying”,“measuring” and “detecting” refer to both quantitative and qualitativedeterminations, and as such, the term “determining” is usedinterchangeably herein with “assaying,” “measuring,” and the like. Wherea quantitative determination is intended, the phrase “determining anamount” of an analyte and the like is used. Where a qualitative and/orquantitative determination is intended, the phrase “determining a level”of an analyte or “detecting” an analyte is used.

By “detectable moiety” is meant a composition that when linked to amolecule of interest renders the latter detectable, via spectroscopic,photochemical, biochemical, immunochemical, or chemical means. Forexample, useful labels include radioactive isotopes, magnetic beads,metallic beads, colloidal particles, fluorescent dyes, electron-densereagents, enzymes (for example, as commonly used in an ELISA), biotin,digoxigenin, or haptens.

By “fragment” is meant a portion of a nucleic acid molecule. Thisportion contains, preferably, at least 10%, 20%, 30%, 40%, 50%, 60%,70%, 80%, or 90% of the entire length of the reference nucleic acidmolecule or polypeptide. A fragment may contain 5, 10, 15, 20, 30, 40,50, 60, 70, 80, 90, or 100 nucleotides.

“Hybridization” means hydrogen bonding, which may be Watson-Crick,Hoogsteen or reversed Hoogsteen hydrogen bonding, between complementarynucleobases. For example, adenine and thymine are complementarynucleobases that pair through the formation of hydrogen bonds.

The terms “isolated,” “purified,” or “biologically pure” refer tomaterial that is free to varying degrees from components which normallyaccompany it as found in its native state. “Isolate” denotes a degree ofseparation from original source or surroundings. “Purify” denotes adegree of separation that is higher than isolation. A “purified” or“biologically pure” protein is sufficiently free of other materials suchthat any impurities do not materially affect the biological propertiesof the protein or cause other adverse consequences. That is, a nucleicacid or peptide of this invention is purified if it is substantiallyfree of cellular material, viral material, or culture medium whenproduced by recombinant DNA techniques, or chemical precursors or otherchemicals when chemically synthesized. Purity and homogeneity aretypically determined using analytical chemistry techniques, for example,polyacrylamide gel electrophoresis or high performance liquidchromatography. The term “purified” can denote that a nucleic acid orprotein gives rise to essentially one band in an electrophoretic gel.For a protein that can be subjected to modifications, for example,phosphorylation or glycosylation, different modifications may give riseto different isolated proteins, which can be separately purified.

By “reference” is meant a standard of comparison. As is apparent to oneskilled in the art, an appropriate reference is where an element ischanged in order to determine the effect of the element. In oneembodiment, the level of a target nucleic acid molecule present in asample may be compared to the level of the target nucleic acid moleculepresent in a clean or uncontaminated sample. For example, the level of atarget nucleic acid molecule present in a sample may be compared to thelevel of the target nucleic acid molecule present in a correspondinghealthy cell or tissue or in a diseased cell or tissue (e.g., a cell ortissue derived from a subject having a disease, disorder, or condition).

By “Marker profile” is meant a characterization of the signal, level,expression or expression level of two or more markers (e.g.,polynucleotides).

As used herein, the term “nucleic acid” refers to deoxyribonucleotides,ribonucleotides, or modified nucleotides, and polymers thereof insingle- or double-stranded form. The term encompasses nucleic acidscontaining known nucleotide analogs or modified backbone residues orlinkages, which are synthetic, naturally occurring, and non-naturallyoccurring. Nucleic acid molecules useful in the methods of the inventioninclude any nucleic acid molecule that specifically binds a targetnucleic acid (e.g., a nucleic acid biomarker). Such nucleic acidmolecules need not be 100% identical with an endogenous nucleic acidsequence, but will typically exhibit substantial identity.Polynucleotides having “substantial identity” to an endogenous sequenceare typically capable of hybridizing with at least one strand of adouble-stranded nucleic acid molecule. By “hybridize” is meant pair toform a double-stranded molecule between complementary polynucleotidesequences (e.g., a gene described herein), or portions thereof, undervarious conditions of stringency. (See, e.g., Wahl, G. M. and S. L.Berger (1987) Methods Enzymol. 152:399; Kimmel, A. R. (1987) MethodsEnzymol. 152:507).

For example, stringent salt concentration will ordinarily be less thanabout 750 mM NaCl and 75 mM trisodium citrate, preferably less thanabout 500 mM NaCl and 50 mM trisodium citrate, and more preferably lessthan about 250 mM NaCl and 25 mM trisodium citrate. Low stringencyhybridization can be obtained in the absence of organic solvent, e.g.,formamide, while high stringency hybridization can be obtained in thepresence of at least about 35% formamide, and more preferably at leastabout 50% formamide. Stringent temperature conditions will ordinarilyinclude temperatures of at least about 30° C., more preferably of atleast about 37° C., and most preferably of at least about 42° C. Varyingadditional parameters, such as hybridization time, the concentration ofdetergent, e.g., sodium dodecyl sulfate (SDS), and the inclusion orexclusion of carrier DNA, are well known to those skilled in the art.Various levels of stringency are accomplished by combining these variousconditions as needed. In a preferred: embodiment, hybridization willoccur at 30° C. in 750 mM NaCl, 75 mM trisodium citrate, and 1% SDS. Ina more preferred embodiment, hybridization will occur at 37° C. in 500mM NaCl, 50 mM trisodium citrate, 1% SDS, 35% formamide, and 100 μg/mldenatured salmon sperm DNA (ssDNA). In a most preferred embodiment,hybridization will occur at 42° C. in 250 mM NaCl, 25 mM trisodiumcitrate, 1% SDS, 50% formamide, and 200 μg/ml ssDNA. Useful variationson these conditions will be readily apparent to those skilled in theart.

For most applications, washing steps that follow hybridization will alsovary in stringency. Wash stringency conditions can be defined by saltconcentration and by temperature. As above, wash stringency can beincreased by decreasing salt concentration or by increasing temperature.For example, stringent salt concentration for the wash steps willpreferably be less than about 30 mM NaCl and 3 mM trisodium citrate, andmost preferably less than about 15 mM NaCl and 1.5 mM trisodium citrate.Stringent temperature conditions for the wash steps will ordinarilyinclude a temperature of at least about 25° C., more preferably of atleast about 42° C., and even more preferably of at least about 68° C. Ina preferred embodiment, wash steps will occur at 25° C. in 30 mM NaCl, 3mM trisodium citrate, and 0.1% SDS. In a more preferred embodiment, washsteps will occur at 42° C. in 15 mM NaCl, 1.5 mM trisodium citrate, and0.1% SDS. In a more preferred embodiment, wash steps will occur at 68°C. in 15 mM NaCl, 1.5 mM trisodium citrate, and 0.1% SDS. Additionalvariations on these conditions will be readily apparent to those skilledin the art. Hybridization techniques are well known to those skilled inthe art and are described, for example, in Benton and Davis (Science196:180, 1977); Grunstein and Hogness (Proc. Natl. Acad. Sci., USA72:3961, 1975); Ausubel et al. (Current Protocols in Molecular Biology,Wiley Interscience, New York, 2001); Berger and Kimmel (Guide toMolecular Cloning Techniques, 1987, Academic Press, New York); andSambrook et al., Molecular Cloning: A Laboratory Manual, Cold SpringHarbor Laboratory Press, New York.

By “substantially identical” is meant a polypeptide or nucleic acidmolecule exhibiting at least 50% identity to a reference amino acidsequence (for example, any one of the amino acid sequences describedherein) or nucleic acid sequence (for example, any one of the nucleicacid sequences described herein). Preferably, such a sequence is atleast 60%, more preferably 80% or 85%, and more preferably 90%, 95%,96%, 97%, 98%, or even 99% or more identical at the amino acid level ornucleic acid to the sequence used for comparison.

Sequence identity is typically measured using sequence analysis software(for example, Sequence Analysis Software Package of the GeneticsComputer Group, University of Wisconsin Biotechnology Center, 1710University Avenue, Madison, Wis. 53705, BLAST, BESTFIT, GAP, orPILEUP/PRETTYBOX programs). Such software matches identical or similarsequences by assigning degrees of homology to various substitutions,deletions, and/or other modifications. Conservative substitutionstypically include substitutions within the following groups: glycine,alanine; valine, isoleucine, leucine; aspartic acid, glutamic acid,asparagine, glutamine; serine, threonine; lysine, arginine; andphenylalanine, tyrosine. In an exemplary approach to determining thedegree of identity, a BLAST program may be used, with a probabilityscore between e⁻³ and e⁻¹⁰⁰ indicating a closely related sequence.

As used herein, the term “sample” includes a biologic sample such as anytissue, cell, fluid, or other material derived from an organism.

By “specifically binds” is meant a compound (e.g., nucleic acid probe orprimer) that recognizes and binds a molecule (e.g., a nucleic acidbiomarker), but which does not substantially recognize and bind othermolecules in a sample, for example, a biological sample.

By “subject” is meant a mammal, including, but not limited to, a humanor non-human mammal, such as a bovine, equine, canine, ovine, or feline.The term “subject” may refer to an animal, which is the object oftreatment, observation, or experiment (e.g., a patient).

By “target nucleic acid molecule” is meant a polynucleotide to beanalyzed. Such polynucleotide may be a sense or antisense strand of thetarget sequence. The term “target nucleic acid molecule” also refers toamplicons of the original target sequence. In various embodiments, thetarget nucleic acid molecule is one or more nucleic acid biomarkers

As used herein, the terms “treat,” treating,” “treatment,” and the likerefer to reducing or ameliorating a disorder and/or symptoms associatedtherewith. It will be appreciated that, although not precluded, treatinga disorder or condition does not require that the disorder, condition orsymptoms associated therewith be completely eliminated.

Ranges provided herein are understood to be shorthand for all of thevalues within the range. For example, a range of 1 to 50 is understoodto include any number, combination of numbers, or sub-range from thegroup consisting 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34,35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, or 50.

Any compounds, compositions, or methods provided herein can be combinedwith one or more of any of the other compositions and methods providedherein. It is also to be understood that the terminology used herein isfor the purpose of describing particular embodiments only, and is notintended to be limiting.

Unless specifically stated or obvious from context, as used herein, theterm “or” is understood to be inclusive.

The term “including” is used herein to mean, and is used interchangeablywith, the phrase “including but not limited to.”

As used herein, the terms “comprises,” “comprising,” “containing,”“having” and the like can have the meaning ascribed to them in U.S.Patent law and can mean “includes,” “including,” and the like;“consisting essentially of” or “consists essentially” likewise has themeaning ascribed in U.S. Patent law and the term is open-ended, allowingfor the presence of more than that which is recited so long as basic ornovel characteristics of that which is recited is not changed by thepresence of more than that which is recited, but excludes prior artembodiments.

Other features and advantages of the invention will be apparent from thefollowing description of the desirable embodiments thereof, and from theclaims.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1A and FIG. 1B, depicts several current options for testing for thepresence of a microorganism (e.g., a pathogenic microorganism). FIG. 1Adepicts features involved in current testing options. In particular,current testing options require culturing for subsequent analysis byantibody bead capture, DNA-bead capture, polymerase chain reaction,immunoassay, DNA probe amplification, colony counting, and restrictiondigest mapping. Current testing options are also limited to one targetedorganism analyzed per test. FIG. 1B is a table comparing specificfeatures of current testing options.

FIG. 2A and FIG. 2B, depicts design of the PathoChip. FIG. 2A depictsuse of a metagenome for probe selection for the PathoChip. Sequenceaccessions for all viruses and selected human pathogenic microorganismswere retrieved from the NCBI DNA sequence databases and concatenated toform a metagenome. Wherever possible, regions of target sequence uniqueto the accession (a, c) were used to select multiple 60 nt probes (1, 2,4-6 in figure) for microarray synthesis, and probes to target regionsthat share similar sequences in at least two viral accessions (b) werealso identified. Probes to prokaryotic and eukaryotic pathogens may mapto intergenic, gene or ribosomal RNA sequences, or a mixture of targettypes, depending on the availability of sequence data. FIG. 2B is aschematic of the process for designing the PathoChip. Parallel anditerative design processes were used to assemble the PathoChip probecollection that covers unique and conserved target regions, supplementedwith high-resolution probe tiling for known cancer-associatedmicroorganisms.

FIG. 3A and FIG. 3B, depict a sample screening workflow using thePathoChip. FIG. 3A depicts that a culturing step is not required toprepare a sample for use with the PathoChip. FIG. 3B depicts a nucleicacid extraction method for the preparation of a sample containing bothDNA and RNA. ¹ It is unknown what bacterial/viral materials may be lostduring xylene de-paraffinization, ² The pellet from this spin shouldcontain large genomic DNA, cells that remain intact, and cellulardebris. The spin is probably not sufficient to pellet intact viralparticles, ³ Viral DNA here is only from unpelleted intact particles.Viral DNA released from lysed host cells should be in pellet. Viral RNAis from intact particles or lysed host cells, ⁴ 80 or 90° C. reversesformalin crosslinking. ⁵ A small aliquot to retain the chance ofrecovering nucleic acid from any unpelleted, intact particles or cells.

FIG. 4 lists foodborne pathogens the panel of probes in Path® Chip v3 iscapable of detecting, including 76 organisms using multiple targetingsequences.

FIG. 5A through FIG. 5I, depicts identification of probes used fordetecting a targeted species. FIG. 5A is a graph depicting selection anddiscarding of probes (e.g., for detecting Clostridium perfringens). FIG.5B is a graph depicting probe selection for detecting Legionellapneumophila. FIG. 5C is a graph depicting probe selection for detectingYersinia enterocolitica. FIG. 5D is a graph depicting probe selectionfor detecting Escherichia coli.

FIG. 5E is a graph depicting probe selection for detecting Vibriocholerae. FIG. 5F is a graph depicting probe selection for detectingClostridium perfringens. FIG. 5G is a graph depicting probe selectionfor detecting Salmonella enterica. FIG. 5H is a graph depicting probeselection for detecting Shigella flexneri. FIG. 5I is a graph depictingprobe selection for detecting Listeria monocytogenes. All probes foreach target were analyzed to identify probe subsets with highsensitivity, using aliquots of diluted stock containing 1000, 100, 10 or1 cell of each target for DNA+RNA extraction, amplification, andhybridization to PathoChip v3. For each targeted organism, probes wereselected that demonstrated appropriate detection response. Subsets ofprobes selected in this manner were used as an assay panel in subsequentstudies.

FIG. 6A through FIG. 6H, show that selected probes were summed to reporta single detection signal. FIG. 6A is a graph depicting summation ofsignal from selected probes to detect Salmonella enterica. FIG. 6B is agraph depicting summation of signal from selected probes to detectListeria monocytogenes. FIG. 6C is a graph depicting summation of signalfrom selected probes to detect Shigella flexneri. FIG. 6D is a graphdepicting summation of signal from selected probes to detect Clostridiumperfringens. FIG. 6E is a graph depicting summation of signal fromselected probes to detect Yersinia enterocolitica. FIG. 6F is a graphdepicting summation of signal from selected probes to detect Vibriocholerae. FIG. 6G is a graph depicting probe selection for detectingEscherichia coli O157:H7. FIG. 6H is a graph depicting probe selectionfor detecting Legionella pneumophila.

FIG. 7A through FIG. 7H, shows detection signals of selected probe setsspecific for various bacteria when mixed with human and lettuce cells.FIG. 7A is a graph depicting detection signal computed as sum ofselected probes for each test sample (solid line) and the control samplebackground (dotted line) (e.g., for detecting Clostridium perfringensmixed with human and lettuce cells). FIG. 7B is a graph depictingdetection signals for selected probe sets to assay samples containingvarious numbers of Escherichia coli O157:H7 cells mixed with human andlettuce cells. FIG. 7C is a graph depicting detection signals forselected probe sets to assay samples containing various numbers ofLegionella pneumophila cells mixed with human and lettuce cells. FIG. 7Dis a graph depicting detection signals for selected probe sets to assaysamples containing various numbers of Listeria monocytogenes cells mixedwith human and lettuce cells. FIG. 7E is a graph depicting detectionsignals for selected probe sets to assay samples containing variousnumbers of Salmonella enterica cells mixed with human and lettuce cells.FIG. 7F is a graph depicting detection signals for selected probe setsto assay samples containing various numbers of Shigella flexneri cellsmixed with human and lettuce cells. FIG. 7G is a graph depictingdetection signals for selected probe sets to assay samples containingvarious numbers of Vibrio cholerae cells mixed with human and lettucecells. FIG. 7H is a graph depicting detection signals for selected probesets to assay samples containing various numbers of Yersiniaenterocolitica cells mixed with human and lettuce cells. A small numberof pathogen cells in a background of a large number of host cells can bedetected, but with less absolute signal compared to pure pathogencultures (see FIG. 6). The difference between test signal and host-onlycontrol signal indicates detection ability. The ability to quantify theamount of pathogen requires more cells than ability to detect merelypresence or absence.

FIG. 8A through FIG. 8H, shows detection signals of selected probe setsspecific for various bacteria when mixed with milk. FIG. 8A is a graphdepicting detection signal computed as sum of selected probes for eachtest sample (solid line) and the control sample background (dotted line)(e.g., for detecting Clostridium perfringens in milk). FIG. 8B is agraph depicting detection signals for selected probe sets to assaysamples containing various numbers of Toxoplasma gondii cells in milk.FIG. 8C is a graph depicting detection signals for selected probe setsto assay samples containing various numbers of Vibrio cholerae cells inmilk. FIG. 8D is a graph depicting detection signals for selected probesets to assay samples containing various numbers of Yersiniaenterocolitica cells in milk. FIG. 8E is a graph depicting detectionsignals for selected probe sets to assay samples containing variousnumbers of Escherichia coli O157:H7 cells in milk. FIG. 8F is a graphdepicting detection signals for selected probe sets to assay samplescontaining various numbers of Legionella pneumophila cells in milk. FIG.8G is a graph depicting detection signals for selected probe sets toassay samples containing various numbers of Salmonella enterica cells inmilk. FIG. 8H is a graph depicting detection signals for selected probesets to assay samples containing various numbers of Shigella flexnericells in milk.

FIG. 9A through FIG. 9H, shows detection signals of selected probe setsspecific for various bacteria when mixed with tomato. FIG. 9A is a graphdepicting detection signals for selected probe sets to assay samplescontaining various numbers of Clostridium perfringens cells mixed withtomato. FIG. 9B is a graph depicting detection signals for selectedprobe sets to assay samples containing various numbers of Toxoplasmagondii cells mixed with tomato. FIG. 9C is a graph depicting detectionsignals for selected probe sets to assay samples containing variousnumbers of Vibrio cholerae cells mixed with tomato. FIG. 9D is a graphdepicting detection signals for selected probe sets to assay samplescontaining various numbers of Yersinia enterocolitica cells mixed withtomato. FIG. 9E is a graph depicting detection signals for selectedprobe sets to assay samples containing various numbers of Escherichiacoli O157:H7 cells mixed with tomato. FIG. 9F is a graph depictingdetection signals for selected probe sets to assay samples containingvarious numbers of Legionella pneumophila cells mixed with tomato. FIG.9G is a graph depicting detection signals for selected probe sets toassay samples containing various numbers of Shigella flexneri cellsmixed with tomato. FIG. 9H is a graph depicting detection signals forselected probe sets to assay samples containing various numbers ofSalmonella enterica cells mixed with tomato.

FIG. 10A through FIG. 10D, shows detection signals of selected probesets specific for various bacteria when mixed with clam. FIG. 10A is agraph depicting detection signals for selected probe sets to assaysamples containing various numbers of Clostridium perfringens cellsmixed with clam. FIG. 10B is a graph depicting detection signals forselected probe sets to assay samples containing various numbers ofToxoplasma gondii cells mixed with clam. FIG. 10C is a graph depictingdetection signals for selected probe sets to assay samples containingvarious numbers of Vibrio cholerae cells mixed with clam. FIG. 10D is agraph depicting detection signals for selected probe sets to assaysamples containing various numbers of Yersinia enterocolitica cellsmixed with clam.

FIG. 11 is a graph depicting the accessional analysis of 60,000 probesof the PathoChip on a patient sample. Accessional analysis identifiedstrong signal associated with fungi of the Rhizomucor genus in thepatient sample compared to control. Accession signal is defined asaverage green (g) of all probes per accession-average red (r) of allprobes per accession.

FIGS. 12A and 12B, are heat map data showing hybridization signal of allthe probes of the accessions selected by Accession Analysis. FIG. 12Aare heat maps generated from analysis of patient and control samplesusing the Path® Chip. FIG. 12B shows the heat maps after eliminatingprobes that were either undetected or were also present in the control.The remaining probes indicated fungal hybridization signals.

FIG. 13 is a graph showing that Rhizomucor pusillus strain NRRL28626 andRhizomucor miehel had the most prominent signals from the screen. Thetop 4 pathogens in the patient sample, providing high accession signals,were fungal, including Rhizomucor pusillus strain NRRL28626, Rhizomucormiehel, Rhizomucor pusillus, and Rhodotorula laryngis.

DETAILED DESCRIPTION OF THE INVENTION

As described herein, the present invention features compositions andmethods for the detection of one or more biomarkers in a samplecomprising genetic material from multiple sources and/or organisms(e.g., metagenomes, microbiomes). In particular, Applicants havedeveloped methods for generating panels or sets of nucleotides for thedetection of genetic material from multiple pathogenic organisms andagents (e.g. viruses), as well as methods for preparing samples foranalysis comprising total nucleic acid extraction (e.g., DNA and RNA).

As described herein, development of the PathoChip platform containingprobes for all public virus sequences and hundreds of pathogenicbacteria, fungi, and helminthes, provides wide coverage of pathogens inan economical format. The PathoChip platform is differentiated fromcurrent technologies by providing faster results that are important tomanufacturers and distributors challenged by product shelf life. In oneaspect, the PathoChip platform can be used to perform clinical assaysfor patient diagnosis, thus having the potential to impact patienttherapy and care. Where possible, multiple probes to independent regionsof the target genome are used to improve opportunity for detection.While PathoChip content was developed from sequences to known targets,some ability to discover new strains or organisms is provided by theinclusion of probes to sequences that are conserved within and betweenviral families; a previously unknown virus with homology to a conservedsequence may produce a corresponding hybridization signal at such aprobe, if not to a complete probe set. A supporting workflow isdescribed for profiling biological and environmental samples, andincludes simultaneous detection of DNA and RNA to expand the range oftargets available for hybridization. The PathoChip platform hasdemonstrated success in non-food applications, as well as detection ofmajor bacterial pathogens in food samples.

Target Nucleic Acid Molecules

Methods and compositions of the invention are useful for theidentification of a target nucleic acid molecule in a test sample ormaterial to be analyzed. Target sequences are amplified from any samplethat comprises a target nucleic acid molecule, including but not limitedto environmental, non-biological, and biological samples. Such samplesmay comprise fungi, spores, viruses, or cells (e.g., prokaryotes,eukaryotes). In specific embodiments, compositions and methods of theinvention detect one or more pathogenic organisms, including viruses,viroids, bacteria, fungi, helminths, and/or protozoa.

Exemplary test samples include body fluids (e.g. blood, serum, plasma,amniotic fluid, sputum, urine, cerebrospinal fluid, lymph, tear fluid,feces, or gastric fluid), tissue extracts, culture media (e.g., a liquidin which a cell, such as a pathogen cell, has been grown), environmentalsamples, agricultural products or other foodstuffs, and their extracts,DNA identification tags. If desired, the sample is purified prior todetection using any standard method typically used for isolating anucleic acid molecule from a biological sample. In one embodiment, atarget nucleic acid of a pathogen is amplified by primer/templateoligonucleotides to detect the presence of a pathogen in a sample.Exemplary pathogens include fungi, bacteria, viruses and yeast. Suchpathogens may be detected by identifying a nucleic acid moleculeencoding a pathogen nucleic acid sequence, in a test sample.

In one embodiment, a sample is a biological sample, such as a tissuesample. The level of one or more polynucleotide biomarkers (e.g., todetect or identify viruses, bacteria, fungi, helminths, and/or protozoa)is measured in different types of biologic samples. In one embodiment,the biologic sample is a tissue sample that includes cells of a tissueor organ, for example, from a biopsy. In another embodiment, thebiologic sample is a biologic fluid sample. Biological fluid samplesinclude cerebrospinal fluid blood, blood serum, plasma, urine, andsaliva, or any other biological fluid useful in the methods of theinvention.

In another embodiment, a sample is an environmental sample, such assoil, sediment water, or air. Environmental samples can be obtained froman industrial source, such as a farm, waste stream, or water source. Forenvironmental applications, test samples may include water, liquidextracts of air filters, soil samples, building materials (e.g.,drywall, ceiling tiles, wall board, fabrics, wall paper, and floorcoverings), environmental swabs, or any other sample.

Target nucleic acid molecules include double-stranded andsingle-stranded nucleic acid molecules (e.g., DNA, RNA, and othernucleobase polymers known in the art capable of hybridizing with anucleic acid molecule described herein). RNA molecules suitable fordetection with a detectable oligonucleotide probe or detectableprimer/template oligonucleotide of the invention include, but are notlimited to, double-stranded and single-stranded RNA molecules thatcomprise a target sequence (e.g., messenger RNA, viral RNA, ribosomalRNA, transfer RNA, microRNA and microRNA precursors, and siRNAs or otherRNAs described herein or known in the art). DNA molecules suitable fordetection with a detectable oligonucleotide probe or primer/templateoligonucleotide of the invention include, but are not limited to, doublestranded DNA (e.g., genomic DNA, plasmid DNA, mitochondrial DNA, viralDNA, and synthetic double stranded DNA). Single-stranded DNA targetnucleic acid molecules include, for example, viral DNA, cDNA, andsynthetic single-stranded DNA, or other types of DNA known in the art.In general, a target sequence for detection is between about 30 andabout 300 nucleotides in length (e.g., 10, 15, 20, 25, 30, 35, 40, 45,50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190,200, 210, 220, 230, 240, 250, 260, 270, 280, 290, 300 nucleotides). In aspecific embodiment the target sequence is about 60 nucleotides inlength. A target sequence for detection may also have at least about 70,80, 90, 95, 96, 97, 98, 99, or even 100% identity to a probe sequence.Probe sequences may be longer or shorter than the target sequence. Forexample, a 60-nucleotide probe may hybridize to at least about 44nucleotides of a target sequence.

In particular embodiments, a biomarker is a biomolecule (e.g., nucleicacid molecule) that is differentially present in a sample (e.g., abiological, non-biological, or environmental sample). For example, abiomarker is taken from a subject of one phenotypic status (e.g., havinga disease) as compared with another phenotypic status (e.g., not havingthe disease). A biomarker is differentially present between differentphenotypic statuses if the mean or median expression level of thebiomarker in the different groups is calculated to be statisticallysignificant. Common tests for statistical significance include, amongothers, t-test, ANOVA, Kruskal-Wallis, Wilcoxon, Mann-Whitney and oddsratio. Biomarkers, alone or in combination, provide measures of relativerisk that a subject belongs to one phenotypic status or another.Therefore, they are useful as markers for characterizing a disease.

Probe Selection

The invention provides sets of probes are selected for detectingmultiple target nucleic acid molecules (e.g., corresponding to multiplebioorganisms). In various embodiments, the invention provides ametagenome, its construction, and its use in the methods of theinvention. As used herein “metagenome” refers to genetic material frommore than one organism, e.g., in an environmental sample. The metagenomeis used to select the sets of probes and/or to validate probe sets. Insome embodiments, the metagenome comprises the sequences or genomes ofabout 20, 30, 40, 50, 60, 70, 80, 90, 100, 200, 300, 400, 500, 1000,1500, 2000 or more organisms. In one example, the nucleic acid sequencesof thousands of organisms were linked to generate a metagenomecomprising 58 chromosomes.

Discrete Metagenome Probe Selection

A. Download individual genomes, genes and partial sequences into a localdatabase of accessionsB. Mask low complexity sequences using bioinformatic tools. In oneexample, low complexity sequences are masked using mdust(http://doc.bioperl.org/bioperl-run/lib/Bio/Tools/Run/Mdust.html)followed by BLASTN 2.0MP-WashU31 identification of unique regions inviral accessions.C. BLASTN sequence comparison of each accession against all otheraccessionsD. Identify specific target regions within each accession

-   -   1. 250-300 bp regions    -   2. No more than 50 contiguous nucleotides with 70% or greater        sequence homology to any other accession or to the human genome        E. Supplement specific targets    -   1. Identify any accessions with zero or one target region    -   2. Relax stringency parameters to no more than 30 contiguous        nucleotides with 50% or greater sequence homology to any other        accession, but no more than 50 contiguous nucleotides with 70%        or greater sequence homology to human genome    -   3. Re-run target region identification on accession subset from        1.E.1.        F. Identify conserved target regions    -   1. 70-300 bp regions that have 70% or greater homology with at        least one other accession    -   2. Remove conserved targets with 50 or more contiguous        nucleotides with 70% or greater sequence homology to human        genome        G. Choose probes    -   1. Run Agilent array CGH probe selection algorithm on specific        and conserved target regions    -   2. Rank probes by Agilent design score    -   3. Select 1-3 highest ranking probes from 1-5 specific target        regions in each accession    -   4. Select 1-3 highest ranking probes from each conserved target        region

Concatenated Metagenome Probe Selection

A. Download individual genomes, genes and partial sequences into a localdatabase of accessionsB. Compile all accessions into a single concatenated metagenome tofacilitate use of genomics bioinformatics tools

-   -   1. Place 100 nonspecific nucleotides (“N”) as spacers between        each accession    -   2. Join accessions and spacers into chromosomes of 6-10 million        bases        C. Run Agilent array CGH probe selection algorithm for        specificity within the metagenome        D. Filter probes for specificity against human, mouse, and/or        other mammalian genomes        E. Choose specific probes    -   1. Rank probes by Agilent design score    -   2. Select 10-20 highest ranking probes from each accession    -   3. Require at least 100 bp separation between probes        F. Choose conserved probes    -   1. Identify conserved regions as in 1.F.    -   2. Select 5-10 highest ranking probes from each conserved region    -   3. Require at least 100 bp separation between probes        G. Empirical probe selection    -   1. Manufacture microarrays containing all specific and conserved        probes    -   2. Hybridize microarrays to labeled human DNA    -   3. Select 5-10 specific probes from each accession with lowest        cross-hybridization signal    -   4. Select 3-5 conserved probes from each conserved regions with        lowest cross-hybridization signal

Sample Preparation

The invention also provides a means for analyzing multiple types ofnucleic acids present in a sample, including DNA and RNA. In variousembodiments, sample preparation involves extracting a mixture of nucleicacid molecules (e.g., DNA and RNA). In other embodiments, samplepreparation involves extracting a mixture of nucleic acids from multipleorganisms, cell types, infectious agents, or any combination thereof. Inone embodiment, sample preparation involves the workflow below.

A. Fragment genomic DNAB. Convert total RNA to first strand cDNA by random-primed reversetranscriptaseC. Label genomic DNA with biotin or fluorescent dye by chemical orenzymatic incorporationD. Label cDNA with biotin or fluorescent dye by chemical or enzymaticincorporationE. Label a mixture of genomic DNA and cDNA in the same chemical orenzymatic reactionF. Mix C+D and co-hybridize to microarray of probesG. Hybridize E to microarray of probesH. Amplify targeted genomic DNA

-   -   1. Use whole-genome amplification (GE GenomiPhi, Sigma WGA,        NuGEN Ovation DNA) to non-specifically amplify genomic DNA    -   2. Use amplified products as input for 4.C, or 4.E.        I. Amplify targeted total RNA    -   1. Use whole-transcriptome amplification (Sigma WTA, Ambion in        vitro transcription, NuGEN Ovation RNA) to non-specifically        amplify total RNA    -   2. Use amplified products as input for 4.D. or 4.E.        The samples are hybridized to the microarray (e.g., PathoChip),        and the microarrays are washed at various stringencies.        Microarrays are scanned for detection of fluorescence.        Background correction and inter-array normalization algorithms        are applied. Detection thresholds are applied. The results are        analyzed for statistical significance.

Nucleic Acid Amplification

Target nucleic acid sequences are optionally amplified before beingdetected. The term “amplified” defines the process of making multiplecopies of the nucleic acid from a single or lower copy number of nucleicacid sequence molecule. The amplification of nucleic acid sequences iscarried out in vitro by biochemical processes known to those of skill inthe art. Prior to or concurrent with identification, the viral samplemay be amplified by a variety of mechanisms, some of which may employPCR. For example, primers for long range PCR may be designed to amplifyregions of the sequence. For RNA viruses a first reverse transcriptasestep may be used to generate double stranded DNA from the singlestranded RNA. See, for example, PCR Technology: Principles andApplications for DNA Amplification (Ed. H. A. Erlich, Freeman Press, NY,N.Y., 1992); PCR Protocols: A Guide to Methods and Applications (Eds.Innis, et al., Academic Press, San Diego, Calif., 1990); Mattila et al.,Nucleic Acids Res. 19, 4967 (1991); Eckert et al., PCR Methods andApplications 1, 17 (1991); PCR (Eds. McPherson et al., IRL Press,Oxford); and U.S. Pat. Nos. 4,683,202, 4,683,195, 4,800,159 4,965,188,and 5,333,675. The sample may be amplified on the array. See, forexample, U.S. Pat. No. 6,300,070 and U.S. Ser. No. 09/513,300.

Other suitable amplification methods include the ligase chain reaction(LCR) (for example, Wu and Wallace, Genomics 4, 560 (1989), Landegren etal., Science 241, 1077 (1988) and Barringer et al. Gene 89:117 (1990)),transcription amplification (Kwoh et al., Proc. Natl. Acad. Sci. USA 86,1173 (1989) and WO88/10315), self-sustained sequence replication(Guatelli et al., Proc. Nat. Acad. Sci. USA, 87, 1874 (1990) andWO90/06995), selective amplification of target polynucleotide sequences(U.S. Pat. No. 6,410,276), consensus sequence primed PCR (CP-PCR) (U.S.Pat. No. 4,437,975), arbitrarily primed PCR (AP-PCR) (U.S. Pat. Nos.5,413,909, 5,861,245) and nucleic acid based sequence amplification(NABSA) (see, U.S. Pat. Nos. 5,409,818, 5,554,517, and 6,063,603). Otheramplification methods that may be used are described in, U.S. Pat. Nos.5,242,794, 5,494,810, 4,988,617 and in U.S. Ser. No. 09/854,317.

Additional methods of sample preparation and techniques for reducing thecomplexity of a nucleic sample are described in Dong et al., GenomeResearch 11, 1418 (2001), in U.S. Pat. Nos. 6,361,947, 6,391,592 andU.S. Ser. Nos. 09/916,135, 09/920,491 (US Patent Application Publication20030096235), Ser. No. 09/910,292 (US Patent Application Publication20030082543), and Ser. No. 10/013,598.

Detection of Biomarkers

The biomarkers of this invention can be detected by any suitable method.The methods described herein can be used individually or in combinationfor a more accurate detection of the biomarkers. Methods for conductingpolynucleotide hybridization assays have been developed in the art.Hybridization assay procedures and conditions will vary depending on theapplication and are selected in accordance with the general bindingmethods known including those referred to in: Sambrook and Russell,Molecular Cloning: A Laboratory Manual (3^(rd) Ed. Cold Spring Harbor,N.Y, 2001); Berger and Kimmel Methods in Enzymology, Vol. 152, Guide toMolecular Cloning Techniques (Academic Press, Inc., San Diego, Calif.,1987); Young and Davism, P.N.A.S, 80: 1194 (1983). Methods and apparatusfor carrying out repeated and controlled hybridization reactions havebeen described in U.S. Pat. Nos. 5,871,928, 5,874,219, 6,045,996 and6,386,749, 6,391,623. A data analysis algorithm (E-predict) forinterpreting the hybridization results from an array is publiclyavailable (see Urisman, 2005, Genome Biol 6:R78).

In one embodiment, the hybridized nucleic acids are detected bydetecting one or more labels attached to, or incorporated within, thesample nucleic acids. The labels may be attached or incorporated by anyof a number of means well known to those of skill in the art. In oneembodiment, the label is simultaneously incorporated during theamplification step in the preparation of the sample nucleic acids. Thus,for example, PCR with labeled primers or labeled nucleotides willprovide a labeled amplification product. In another embodiment,transcription amplification, as described above, using a labelednucleotide (e.g. fluorescein-labeled UTP and/or CTP) incorporates alabel into the transcribed nucleic acids. In another embodiment PCRamplification products are fragmented and labeled by terminaldeoxytransferase and labeled dNTPs. Alternatively, a label may be addeddirectly to the original nucleic acid sample (e.g., mRNA, polyA mRNA,cDNA, etc.) or to the amplification product after the amplification iscompleted. Means of attaching labels to nucleic acids are well known tothose of skill in the art and include, for example, nick translation orend-labeling (e.g. with a labeled RNA) by kinasing the nucleic acid andsubsequent attachment (ligation) of a nucleic acid linker joining thesample nucleic acid to a label (e.g., a fluorophore). In anotherembodiment label is added to the end of fragments using terminaldeoxytransferase.

Detectable labels suitable for use in the present invention include anycomposition detectable by spectroscopic, photochemical, biochemical,immunochemical, electrical, optical or chemical means. Useful labels inthe present invention include, but are not limited to: biotin forstaining with labeled streptavidin conjugate; anti-biotin antibodies,magnetic beads (e.g., Dynabeads™.); fluorescent dyes (e.g., fluorescein,texas red, rhodamine, green fluorescent protein, and the like);radiolabels (e.g., ³H, ¹²⁵I, ³⁵S, ¹⁴C, or ³²P); phosphorescent labels;enzymes (e.g., horse radish peroxidase, alkaline phosphatase and otherscommonly used in an ELISA); and colorimetric labels such as colloidalgold or colored glass or plastic (e.g., polystyrene, polypropylene,latex, etc.) beads. Patents teaching the use of such labels include U.S.Pat. Nos. 3,817,837, 3,850,752, 3,939,350, 3,996,345, 4,277,437,4,275,149 and 4,366,241.

Means of detecting such labels are well known to those of skill in theart. Thus, for example, radiolabels may be detected using photographicfilm or scintillation counters; fluorescent markers may be detectedusing a photodetector to detect emitted light. Enzymatic labels aretypically detected by providing the enzyme with a substrate anddetecting the reaction product produced by the action of the enzyme onthe substrate, and calorimetric labels are detected by simplyvisualizing the colored label.

Methods and apparatus for signal detection and processing of intensitydata are disclosed in, for example, U.S. Pat. Nos. 5,143,854, 5,547,839,5,578,832, 5,631,734, 5,800,992, 5,834,758; 5,856,092, 5,902,723,5,936,324, 5,981,956, 6,025,601, 6,090,555, 6,141,096, 6,185,030,6,201,639; 6,218,803; and 6,225,625, in U.S. Ser. Nos. 10/389,194,60/493,495 and in PCT Application PCT/US99/06097 (published asWO99/47964).

Detection by Biochip

In aspects of the invention, a sample is analyzed by means of a biochip(also known as a microarray). The nucleic acid molecules of theinvention are useful as hybridizable array elements in a biochip.Biochips generally comprise solid substrates and have a generally planarsurface, to which a capture reagent (also called an adsorbent oraffinity reagent) is attached. Frequently, the surface of a biochipcomprises a plurality of addressable locations, each of which has thecapture reagent bound there.

The array elements are organized in an ordered fashion such that eachelement is present at a specified location on the substrate. Usefulsubstrate materials include membranes, composed of paper, nylon or othermaterials, filters, chips, glass slides, and other solid supports. Theordered arrangement of the array elements allows hybridization patternsand intensities to be interpreted as expression levels of particulargenes or proteins. Methods for making nucleic acid microarrays are knownto the skilled artisan and are described, for example, in U.S. Pat. No.5,837,832, Lockhart, et al. (Nat. Biotech. 14:1675-1680, 1996), andSchena, et al. (Proc. Natl. Acad. Sci. 93:10614-10619, 1996), hereinincorporated by reference. U.S. Pat. Nos. 5,800,992 and 6,040,138describe methods for making arrays of nucleic acid probes that can beused to detect the presence of a nucleic acid containing a specificnucleotide sequence. Methods of forming high-density arrays of nucleicacids, peptides and other polymer sequences with a minimal number ofsynthetic steps are known. The nucleic acid array can be synthesized ona solid substrate by a variety of methods, including, but not limitedto, light-directed chemical coupling, and mechanically directedcoupling. For additional descriptions and methods relating toresequencing arrays see U.S. patent application Ser. Nos 10/658,879,60/417,190, 09/381,480, 60/409,396, and U.S. Pat. Nos. 5,861,242,6,027,880, 5,837,832, 6,723,503.

Detection by Nucleic Acid Biochip

In aspects of the invention, a sample is analyzed by means of a nucleicacid biochip (also known as a nucleic acid microarray). To produce anucleic acid biochip, oligonucleotides may be synthesized or bound tothe surface of a substrate using a chemical coupling procedure and anink jet application apparatus, as described in PCT applicationWO95/251116 (Baldeschweiler et al.). Alternatively, a gridded array maybe used to arrange and link cDNA fragments or oligonucleotides to thesurface of a substrate using a vacuum system, thermal, UV, mechanical orchemical bonding procedure. Exemplary nucleic acid molecules useful inthe invention include polynucleotides that specifically bind nucleicacid biomarkers to one or more pathogenic organisms, and fragmentsthereof.

A nucleic acid molecule (e.g. RNA or DNA) derived from a biologicalsample may be used to produce a hybridization probe as described herein.The biological samples are generally derived from a patient, e.g., as abodily fluid (such as blood, blood serum, plasma, saliva, urine,ascites, cyst fluid, and the like); a homogenized tissue sample (e.g., atissue sample obtained by biopsy); or a cell isolated from a patientsample. For some applications, cultured cells or other tissuepreparations may be used. The mRNA is isolated according to standardmethods, and cDNA is produced and used as a template to makecomplementary RNA suitable for hybridization. Such methods are wellknown in the art. The RNA is amplified in the presence of fluorescentnucleotides, and the labeled probes are then incubated with themicroarray to allow the probe sequence to hybridize to complementaryoligonucleotides bound to the biochip.

Incubation conditions are adjusted such that hybridization occurs withprecise complementary matches or with various degrees of lesscomplementarity depending on the degree of stringency employed. Forexample, stringent salt concentration will ordinarily be less than about750 mM NaCl and 75 mM trisodium citrate, less than about 500 mM NaCl and50 mM trisodium citrate, or less than about 250 mM NaCl and 25 mMtrisodium citrate. Low stringency hybridization can be obtained in theabsence of organic solvent, e.g., formamide, while high stringencyhybridization can be obtained in the presence of at least about 35%formamide, and most preferably at least about 50% formamide. Stringenttemperature conditions will ordinarily include temperatures of at leastabout 30° C., of at least about 37° C., or of at least about 42° C.Varying additional parameters, such as hybridization time, theconcentration of detergent, e.g., sodium dodecyl sulfate (SDS), and theinclusion or exclusion of carrier DNA, are well known to those skilledin the art. Various levels of stringency are accomplished by combiningthese various conditions as needed. In a preferred embodiment,hybridization will occur at 30° C. in 750 mM NaCl, 75 mM trisodiumcitrate, and 1% SDS. In embodiments, hybridization will occur at 37° C.in 500 mM NaCl, 50 mM trisodium citrate, 1% SDS, 35% formamide, and 100μg/ml denatured salmon sperm DNA (ssDNA). In other embodiments,hybridization will occur at 42° C. in 250 mM NaCl, 25 mM trisodiumcitrate, 1% SDS, 50% formamide, and 200 μg/ml ssDNA. Useful variationson these conditions will be readily apparent to those skilled in theart.

The removal of nonhybridized probes may be accomplished, for example, bywashing. The washing steps that follow hybridization can also vary instringency. Wash stringency conditions can be defined by saltconcentration and by temperature. As above, wash stringency can beincreased by decreasing salt concentration or by increasing temperature.For example, stringent salt concentration for the wash steps willpreferably be less than about 30 mM NaCl and 3 mM trisodium citrate, andmost preferably less than about 15 mM NaCl and 1.5 mM trisodium citrate.Stringent temperature conditions for the wash steps will ordinarilyinclude a temperature of at least about 25° C., of at least about 42°C., or of at least about 68° C. In embodiments, wash steps will occur at25° C. in 30 mM NaCl, 3 mM trisodium citrate, and 0.1% SDS. In a morepreferred embodiment, wash steps will occur at 42 C in 15 mM NaCl, 1.5mM trisodium citrate, and 0.1% SDS. In other embodiments, wash stepswill occur at 68 C in 15 mM NaCl, 1.5 mM trisodium citrate, and 0.1%SDS. Additional variations on these conditions will be readily apparentto those skilled in the art.

Detection system for measuring the absence, presence, and amount ofhybridization for all of the distinct nucleic acid sequences are wellknown in the art. For example, simultaneous detection is described inHeller et al., Proc. Natl. Acad. Sci. 94:2150-2155, 1997. Inembodiments, a scanner is used to determine the levels and patterns offluorescence.

Diagnostic Assays

The present invention provides a number of diagnostic assays that areuseful for the identification or characterization of a disease ordisorder (e.g., infectious disease), or a propensity to develop such acondition. In one embodiment, a disease, disorder, or condition ischaracterized by quantifying the level of one or more biomarkers fromone or more pathogenic organisms, including viruses, viroids, bacteria,fungi, helminths, and protozoa. While the examples provided belowdescribe specific methods of detecting levels of these markers, theskilled artisan appreciates that the invention is not limited to suchmethods. Marker levels are quantifiable by any standard method, suchmethods include, but are not limited to real-time PCR, Southern blot,PCR, and/or mass spectroscopy.

The level of any two or more of the markers described herein defines themarker profile of a disease, disorder, condition. The level of marker iscompared to a reference. In one embodiment, the reference is the levelof marker present in a control sample obtained from a patient that doesnot have the disease, disorder, or condition. In another embodiment, thereference is a baseline level of marker present in a biologic samplederived from a patient prior to, during, or after treatment for adisease, disorder, or condition. In yet another embodiment, thereference is a standardized curve. The level of any one or more of themarkers described herein (e.g., a combination of viral, bacterial,fungal, helminth, and/or protozoan biomarkers) is used, alone or incombination with other standard methods, to characterize the disease,disorder, or condition.

Implementation in Hardware and/or Software

The methods described herein can be implemented on general-purpose orspecially programmed hardware or software. For example, the methods canbe implemented by a computer readable medium. Accordingly, the presentinvention also provides a software and/or a computer program productconfigured to perform the algorithms and/or methods according to anyembodiment of the present invention. It is well-known to a skilledperson in the art how to configure software which can perform thealgorithms and/or methods provided in the present invention. Thecomputer-readable medium can be non-transitory and/or tangible. Forexample, the computer readable medium can be volatile memory (e.g.,random access memory and the like) or non-volatile memory (e.g.,read-only memory, hard disks, floppy discs, magnetic tape, opticaldiscs, paper table, punch cards, and the like). The computer executableinstructions may be written in a suitable computer language orcombination of several languages. Basic computational biology methodsare described in, for example Setubal and Meidanis et al., Introductionto Computational Biology Methods (PWS Publishing Company, Boston, 1997);Salzberg, Searles, Kasif, (Ed.), Computational Methods in MolecularBiology, (Elsevier, Amsterdam, 1998); Rashidi and Buehler,Bioinformatics Basics: Application in Biological Science and Medicine(CRC Press, London, 2000) and Ouelette and Bzevanis Bioinformatics: APractical Guide for Analysis of Gene and Proteins (Wiley & Sons, Inc.,2^(nd) ed., 2001).

The present invention may also make use of various computer programproducts and software for a variety of purposes, such as probe design,management of data, analysis, and instrument operation. (See, U.S. Pat.Nos. 5,593,839, 5,795,716, 5,733,729, 5,974,164, 6,066,454, 6,090,555,6,185,561, 6,188,783, 6,223,127, 6,229,911 and 6,308,170.) Additionally,the present invention may have preferred embodiments that includemethods for providing genetic information over networks such as theInternet as shown in U.S. Ser. Nos. 10/197,621, 10/063,559 (US Pub No20020183936), Ser. Nos. 10/065,856, 10/065,868, 10/328,818, 10/328,872,10/423,403, and 60/482,389.

Kits

The invention provides kits for the detection of a biomarker, which isindicative of the presence of one or more biological agents capable ofcausing a disease, disorder, or condition. The kits may be used fordetecting the presence of multiple biological agents capable of causingone or more diseases or disorders. The kits may be used for thediagnosis of the disease, disorder, or condition. In some embodiments,the kit comprises a panel or collection of probes to nucleic acidbiomarkers (e.g., PathoChip).

In some embodiments, the kit comprises one or more sterile containerswhich contain the panel of probes to nucleic acid biomarkers, ormicroarray chip. Such containers can be boxes, ampoules, bottles, vials,tubes, bags, pouches, blister-packs, or other suitable container formsknown in the art. Such containers can be made of plastic, glass,laminated paper, metal foil, or other materials suitable for holdingmedicaments.

The instructions will generally include information about the use of thecomposition for the detection or diagnosis of a disease or disorder. Inother embodiments, the instructions include at least one of thefollowing: description of the therapeutic agent; dosage schedule andadministration for treatment or prevention of disease, disorder, orsymptoms thereof; precautions; warnings; indications;counter-indications; overdosage information; adverse reactions; animalpharmacology; clinical studies; and/or references. The instructions maybe printed directly on the container (when present), or as a labelapplied to the container, or as a separate sheet, pamphlet, card, orfolder supplied in or with the container.

The practice of the present invention employs, unless otherwiseindicated, conventional techniques of molecular biology (includingrecombinant techniques), microbiology, cell biology, biochemistry andimmunology, which are well within the purview of the skilled artisan.Such techniques are explained fully in the literature, such as,“Molecular Cloning: A Laboratory Manual”, second edition (Sambrook,1989); “Oligonucleotide Synthesis” (Gait, 1984); “Animal Cell Culture”(Freshney, 1987); “Methods in Enzymology” “Handbook of ExperimentalImmunology” (Weir, 1996); “Gene Transfer Vectors for Mammalian Cells”(Miller and Calos, 1987); “Current Protocols in Molecular Biology”(Ausubel, 1987); “PCR: The Polymerase Chain Reaction”, (Mullis, 1994);“Current Protocols in Immunology” (Coligan, 1991). These techniques areapplicable to the production of the polynucleotides and polypeptides ofthe invention, and, as such, may be considered in making and practicingthe invention. Particularly useful techniques for particular embodimentswill be discussed in the sections that follow.

The following examples are put forth so as to provide those of ordinaryskill in the art with a complete disclosure and description of how tomake and use the assay, screening, and therapeutic methods of theinvention, and are not intended to limit the scope of what the inventorsregard as their invention.

EXAMPLES Example 1. Materials and Methods Microarray Design

National Center for Biotechnology Information (NCBI) databases forGenome, Gene and Nucleotide accessions were queried(www.ncbi.nlm.nih.gov/pubmed) for all taxonomy=virus annotations, andfor accessions from prokaryotic and eukaryotic human pathogen listscompiled by literature searches and web resources (www.niaid.nih.gov:Emerging and Re-emerging Infectious Diseases, Category A, B, and CPriority Pathogens). The resulting accessions were assembled into anon-redundant concatenation with 100 N nucleotide separators betweenaccessions. This metagenome was divided into 58 “chromosomes” eacharound 5-10 million nucleotides in length, and submitted to AgilentTechnologies (Santa Clara Calif., USA) as a custom design project. Probesequences (50-60 nt) were selected using the Agilent array comparativegenomic hybridization (aCGH) design algorithms, and then filtered forlow likelihood of cross-hybridization to human genomic sequences.

Independently, low complexity regions in the metagenome were maskedusing mdust(http://doc.bioperl.org/bioperl-run/lib/Bio/Tools/Run/Mdust.html)followed by BLASTN 2.0MP-WashU³¹ identification of unique regions inviral accessions. Unique region criteria were 250-300 bp and <50contiguous bp with >70% identity to a sequence in any other metagenomeaccession. Conserved viral regions were similarly identified usingcriteria of 70-300 bp and >70% identity to at least one other virus butnot to human sequences.

All Agilent designed probes that mapped to unique or conserved viralregions, or any prokaryotic or eukaryotic pathogen accession, were addedto the microarray design by default if fewer than 10 probes wereavailable for the source accession. Otherwise, the probes were filteredfor minimum inter-probe spacing of 100 bp and distribution that roughlycovers the full length of each accession while limiting the number ofprobes to 10-20 per accession. The number of probes was not restrictedfor known oncogenic organisms, creating a saturation tiling set coveringthese accessions' entire sequences to the extent possible with allavailable Agilent designed probes. The microarray was supplemented withpredesigned aCGH probes for 660 genes and 602 intergenic regions fromthe human genome, and probes for Saccharomyces cerevisiae. Probes andaccession annotations are available in the Gene Expression Omnibus(http://www.ncbi.nlm.nih.gov/geo/).

Sample Preparation

Total nucleic acid was extracted from the samples. Whole-genomeamplifications (WGA) of genomic DNA and/or cDNA from random primed,reverse transcribed total RNA were performed with the Illustra GenomiPhiv2 kit (GE Healthcare Bio-Sciences, Pittsburgh Pa., USA), Ovation WGASystem (NuGEN, San Carlos Calif., USA), and GenomePlex or TransPlex kits(WGA2, WTA2, Sigma-Aldrich, St. Louis Mo., USA) using vendor recommendedprotocols and input amounts. Amplification products were purified withthe QIAquick PCR Purification Kit (Qiagen), and 2 ug used for Cy3 dyelabeling by the SureTag Labeling Kit (Agilent). Cy5 dye labeling wasperformed on 2 ug of Human Reference DNA from the Agilent SureTag kit,without prior WGA (experiment 1, Table 1) or after WGA (all otherexperiments), as a control to report probe cross-hybridization to human(xhh) DNA. Labeled DNA was purified with SureTag kit spin columns andspecific activities were calculated.

Microarray Production and Processing

SurePrint glass slide microarrays (Agilent) were manufactured with 60 ntDNA oligomers synthesized in 60,000 features on eight replicate arraysper slide. PathoChip v2a and v2b contained 60,000 probes to uniquetarget regions or conserved plus saturation target regions,respectively. PathoChip v3 contained 37,704 probes to unique targets and23,627 probes for conserved targets or to saturate known oncogenicagents.

Labeled samples were hybridized to microarrays as described in theAgilent Oligonucleotide Array-Based CGH for Genomic DNA Analysisprotocol (version 7.2, G4410-90010). Master mixes containing aCGHblocking agent, HI-RPM hybridization buffer, and Cot-1 DNA (pilot assaysonly) were added to a mixture of the entire labeled test sample and thexhh control sample, denatured, and hybridized to arrays under 8-chambergasket slides at 65° C. with 20 rpm rotation for 40 hours in an AgilentHybridization Oven. Arrays were processed using Wash Procedure A, andscanned on an Agilent SureScan G4900DA Microarray Scanner.

Microarray Data Analysis

Scanned microarray images were analyzed using Agilent Feature Extractionsoftware to calculate average pixel intensity and subtract localbackground for each feature. Images were manually examined to note anyarrays affected by high background, scratches or other technicalartifacts. Feature intensity distribution and channel balance were notused for quality control because most features are expected to have nosignal, except for the control human probes.

Feature intensities for Cy3 and Cy5 channels were imported into PartekGenomics Suite (Partek Inc., St. Louis Mo., USA). The average intensityfor human intergenic control probes was calculated for co-hybridizedtest and xhh samples, and a scale factor determined which would make theCy5 xhh average equal to the Cy3 average. The Cy5 intensities for allPath® Chip probes were then multiplied by the scale factor to normalizefor differences in dye performance. Cy3/Cy5 ratios and Cy3-Cy5subtractions were calculated for each probe to provide input fordual-channel or single-channel analysis pipelines respectively.Accession Average (AccAvg) was defined as the average Cy3 or Cy5intensity across all probes for one accession, and Accession Signal(AccSig) was defined as AccAvg(Cy3)-AccAvg(Cy5).

Model-based Analysis of Tiling-arrays (MAT)³² as implemented in Partekwas used for sliding window analysis of probe signals (Cy3 minus Cy5)for each sample. MAT parameters were p-value cutoff 0.99, window 5000bp, minimum number of positive probes 5, and discard 0%. Candidateregions were classified by MAT score 30-300, 300-3000, and >3000.

Partek ANOVA tools were used to perform paired t-tests with multipletesting correction using all samples as replicates of the test conditionand co-hybridized xhh DNA replicates as the control condition.Comparisons were performed at the accession level using AccAvg(Cy3) vs.AccAvg(Cy5) and at the individual probe level using Cy3 vs. Cy5intensity values. Significance thresholds were set at a stepup falsediscovery rate <0.05 and fold-difference >2. An outlier analysis wasalso performed at accession and probe levels by calculating the standarddeviation of AccSig or probe signal, and filtering for any values thatwere two or more standard deviations higher than the population mean.

Example 2. Microarray Design

The PathoChip design goals were to cover all public NCBI viral genomes,and sequences from selected microorganisms that are pathogenic tohumans, using multiple probes to independent target sites in eachspecies' genome (FIG. 2A). The resulting collection of sequences wasassembled into a metagenome of 448.9 million bp containing 5206accessions for over 4200 viruses, bacteria and eukaryotes. Agilentcustom probe design algorithms built for comparative genomichybridization applications were used to identify 5.5 million probes inthe metagenome, over 3 million of which are predicted to have low riskof cross-hybridization with a human genome sequence. A subset of theseprobes that map to unique target regions were synthesized on PathoChipv2a microarrays, and a separate set that covers regions of sequenceconservation between at least two viruses was synthesized on PathoChipv2b arrays (FIG. 2B). PathoChip v2b also included 2085 probes tiledthroughout the lengths of 22 accessions for known cancer-associatedorganisms.

Pilot assays using Agilent reference human DNA showed median probeintensities of over 750 fluorescence units for probes to humansequences, and around 17 fluorescence units for non-human specificprobes on PathoChip v2a and 120 fluorescence units for non-humanconserved probes on PathoChip v2b (Table 1). These assays identified6360 probes with fluorescence >150 that apparently hybridize to humanDNA and were therefore removed from consideration for the PathoChip v3design (FIG. 2B).

TABLE 1 Probes to Human Sequences Xhh Non- Non- cross- Human Human humanhuman hyb probes, probes, probes, probes, Test control median medianmedian median Experiment PathoChip (Cy3) (Cy5) Amplification Cy3 Cy5 Cy3Cy5 1 v2a Human Human none 794 785 18 17 gDNA, gDNA, no Cot-1 no Cot-1 1v2b Human Human none 726 741 119 124 gDNA, gDNA, no Cot-1 no Cot-1 1 v2aHuman Human none 758 794 17 17 gDNA + gDNA + Cot-1 Cot-1 1 v2b HumanHuman none 758 791 121 128 gDNA + gDNA + Cot-1 Cot-1

Example 3. Identification of Best Performing Probes for Targeted Species

Target organisms, including Legionella pneumophila, Yersiniaenterocolitica, Escherichia coli, Vibrio cholerae, Clostridiumperfringens, Salmonella enterica, Shigella flexneri, and Listeriamonocytogenes, were grown in pure cultures, and 2 million cells of eachtarget organism were pooled into one stock. Aliquots of diluted stockcontaining 1000, 100, 10 or 1 cell of each target were used for DNA+RNAextraction, amplification, and hybridization to PathoChip v3. All probesfor each target were analyzed to identify probe subsets with highsensitivity for Legionella pneumophila, Yersinia enterocolitica,Escherichia coli, Vibrio cholerae, Clostridium perfringens, Salmonellaenterica, Shigella flexneri, and Listeria monocytogenes (FIGS. 5A-5I).The selected probes were summed to report a single detection signal foreach of Salmonella enterica, Listeria monocytogenes, Shigella flexneri,Clostridium perfringens, Yersinia enterocolitica, Vibrio cholerae,Escherichia coli O157:H7, and Legionella pneumophila (FIGS. 6A-6H).These results indicate that the Path® Chip is able to detect thepresence of various target organisms with high sensitivity.

Example 4. Assay Testing in the Presence of Human and Plant BackgroundDNA+RNA

Path® Chip v3 was tested using mixtures of target organisms with humanand lettuce cells. Aliquots of diluted bacterial pools were mixed with100,000 human cells and 100 mg lettuce. Control samples contained humanand lettuce cells only. The entire sample volume was used for nucleicacid extraction. All DNA and RNA recovered from each sample wasamplified, labeled, and hybridized to Path® Chip v3. Detection signalwas computed as sum of selected probes for each test sample (solid line)and the control sample background (dotted line) (FIG. 7A). Detectionsignals were obtained for each of Escherichia coli O157:H7, Legionellapneumophila, Listeria monocytogenes, Salmonella enterica, Shigellaflexneri, Vibrio cholerae, and Yersinia enterocolitica cells mixed withhuman and lettuce cells (FIGS. 7B-7H. A small number of pathogen cellsin a background of a large number of host cells can be detected, butwith less absolute signal compared to pure pathogen cultures (see, FIGS.6B-6H). The difference between test signal and host-only control signalindicates detection ability. These results indicate that the PathoChipis able to detect the presence of various target organisms with highsensitivity in the presence of background RNA and DNA.

Example 5. Assay Testing in the Presence of Food Background DNA+RNA

PathoChip v3 was tested using mixtures of target organisms and variousfoods. Target organisms, including Toxoplasma gondii, Vibrio cholerae,Yersinia enterocolitica, Escherichia coli O157:H7, Legionellapneumophila, Salmonella enterica, Shigella flexneri. Aliquots of dilutedpathogen pool (1000 or 100 cells per species) mixed with milk ((FIGS.8A-8H), tomato ((FIGS. 9A-9H), or clam ((FIGS. 10A-10D). Control samplescontained food only. The entire sample volume was used for nucleic acidextraction. All DNA and RNA recovered from each sample was amplified,labeled, and hybridized to PathoChip v3. Detection signal was computedas sum of selected probes for each test sample and the control samplebackground. These results indicate that the Path® Chip is able to detectthe presence of various target organisms with high sensitivity in food.

Example 6. Detection and Identification of an Unknown Infectious Agentin a Patient Sample

An important factor in the clinical management of infectious diseaseslies in the establishment of the identity of the etiologic agent orpathogen responsible for the infection. A rapid and accurate diagnosisinforms treatment selection and has a direct impact on patient outcome.The Path® Chip and the extraction protocol used with it provide a way todetect and analyze pathogens from any type of sample, thus overcomingchallenges that limit current procedures for pathogen detection.

Analysis of a patient sample using the PathoChip was able to detect afungal agent that a hospital pathology lab was not able to identify. Thepatient was extremely ill when admitted to the hospital and presentedsymptoms of an infection. A brain sample from the patient was analyzedby querying the Path® Chip with total nucleic acid isolated from thesample obtained.

Accessional analysis of 60,000 probes of PathoChip identified strongsignal associated with fungi of the Rhizomucor genus (FIG. 11). Heat mapdata showing hybridization signal of all the probes of the accessionsselected by Accession Analysis were generated (FIG. 12A). Probes thatwere either undetected or were also present in the control weredisregarded (FIG. 13B), leaving a number of probes indicating fungalhybridization signals (FIG. 12B). Indeed, the top 2 pathogens providinghigh accession signals were Rhizomucor pusillus strain NRRL28626 andRhizomucor miehel (FIG. 13). The other pathogens having high accessionsignals were Rhizomucor pusillus and Rhodotorula laryngis, although theaccession signals were substantially lower compared to the top 2pathogens. As both Rhizomucor pusillus strain NRRL28626 and Rhizomucormiehel had the most prominent signals from the screen, these 2 agentswere identified as the pathogens. Interestingly, two different speciesof this fungus were able to be identified and distinguished using thePathoChip, demonstrating the power of the technology. Thus, thePathoChip was shown to be useful as a clinical assay by identifying 2related fungi associated with this type of infection. Based on thisdiagnosis, an antifungal treatment regimen could be selected for thepatient.

Identification of the infectious agent was achieved in about 36 hourswith conservative estimates in hybridization to prevent signal loss. Itis expected that an optimized protocol can substantially reduce the timeto detection within 24 hours and no longer than 48 hours.

This demonstration clearly shows the ability of the PathoChip toidentify an unknown agent in a patient sample which was not evenpossible in a clinical pathology lab of a major metropolitan hospital.

OTHER EMBODIMENTS

From the foregoing description, it will be apparent that variations andmodifications may be made to the invention described herein to adopt itto various usages and conditions. Such embodiments are also within thescope of the following claims.

The recitation of a listing of elements in any definition of a variableherein includes definitions of that variable as any single element orcombination (or subcombination) of listed elements. The recitation of anembodiment herein includes that embodiment as any single embodiment orin combination with any other embodiments or portions thereof.

All patents, publications, and accession numbers mentioned in thisspecification are herein incorporated by reference to the same extent asif each independent patent, publication, and accession number wasspecifically and individually indicated to be incorporated by reference.

What is claimed is:
 1. A method of selecting a set of probes for thedetection of one or more target nucleic acid molecules the methodcomprising: generating a metagenome comprising a plurality of nucleicacid molecules; and selecting nucleic acid probes that specificallytarget a unique nucleic acid sequence in the metagenome.
 2. The methodof claim 1, wherein the step of selecting the probes further hybridizingone or more probes to the metagenome; and selecting probes thathybridize to one or more target nucleic acid molecules in themetagenome.
 3. The method of claim 1, wherein the metagenome comprisesequences and/or genomes from a plurality of organisms or pathogens. 4.The method of claim 3, wherein the plurality of pathogens comprise twoor more pathogens selected from the group consisting of viral,bacterial, fungal, helminth, and protozoan pathogens.
 5. The method ofclaim 1, wherein the metagenome is discrete or concatenated.
 6. Themethod of claim 1, wherein one or more steps is performed using anon-transitory computer readable medium containing program instructionsexecutable by a processor.
 7. A metagenome comprising a plurality ofnucleic acid molecules, wherein the nucleic acid molecules comprisesequences or genomes from a plurality of organisms.
 8. The metagenome ofclaim 7, wherein the organism is a pathogen.
 9. The metagenome of claim8, wherein the plurality of pathogens comprise two or more pathogensselected from the group consisting of viral, bacterial, fungal,helminth, and protozoan pathogens.
 10. The metagenome of claim 9,comprising two or more pathogens.
 11. The metagenome of claim 10,comprising about 20, 30, 40, 50, 60, 70, 80, 90, 100, 200, 300, 400,500, 600, 700, 800, 900, 1000, 2000, 3000 or more pathogens.
 12. Themetagenome of claim 7, comprising two or more genomes.
 13. Themetagenome of claim 12, comprising about 20, 30, 40, 50, 60, 70, 80, 90,100, 200, 300, 400, 500, 600, 700, 800, 900, 1000, 2000, 3000 or moregenomes.
 14. The metagenome of claim 7, wherein the metagenome isdiscrete or concatenated.
 15. The metagenome of claim 7, comprising aspacer sequence between genomes.
 16. A set of probes for the detectionof one or more target nucleic acid molecules made by a methodcomprising: generating a metagenome comprising a plurality of nucleicacid molecules; and selecting nucleic acid probes that specificallytarget a unique nucleic acid sequence in the metagenome.
 17. The set ofprobes of claim 16, wherein the probes are bound to a solid substrate.18. The set of probes of claim 17, wherein the solid substrate comprisesa panel, biochip, microarray, membrane, bead, or well.
 19. The set ofprobes of claim 17, arranged in an array.
 20. A method of detecting oneor more target nucleic acid sequences in a sample, comprising the use ofa set of probes, the set of probes made by a method comprising:generating a metagenome comprising a plurality of nucleic acidmolecules; selecting nucleic acid probes that specifically target aunique nucleic acid sequence in the metagenome; the method comprisinghybridizing one or more sample nucleic acid molecules to the set ofprobes, wherein the sample nucleic molecules are labeled with adetectable moiety; and detecting a signal from the binding of a nucleicacid probe and a sample nucleic acid molecule labeled with thedetectable moiety, as compared to a reference.
 21. The method of claim20, wherein the sample nucleic acid molecules comprise DNA and RNA. 22.The method of claim 20, wherein the sample is an environmental orbiological sample.
 23. The method of claim 20, wherein the samplenucleic acid molecules are from a plurality of organisms or pathogens.24. The method of claim 23, wherein the plurality of pathogens comprisetwo or more pathogens selected from the group consisting of viral,bacterial, fungal, helminth, and protozoan pathogens.
 25. The method ofclaim 24, comprising about 20, 30, 40, 50, 60, 70, 80, 90, 100, 200,300, 400, 500, 600, 700, 800, 900, 1000, 2000, 3000 or more pathogens.